API design for machine learning software: experiences from the scikit-learn project

نویسندگان

  • Lars Buitinck
  • Gilles Louppe
  • Mathieu Blondel
  • Fabian Pedregosa
  • Andreas Mueller
  • Olivier Grisel
  • Vlad Niculae
  • Peter Prettenhofer
  • Alexandre Gramfort
  • Jaques Grobler
  • Robert Layton
  • Jacob VanderPlas
  • Arnaud Joly
  • Brian Holt
  • Gaël Varoquaux
چکیده

scikit-learn is an increasingly popular machine learning library. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Seglearn: A Python Package for Learning Sequences and Time Series

seglearn is an open-source python package for machine learning time series or sequences using a sliding window segmentation approach. The implementation provides a flexible pipeline for tackling classification, regression, and forecasting problems with multivariate sequence and contextual data. This package is compatible with scikit-learn and is listed under scikit-learn ”Related Projects”. The...

متن کامل

Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented state-of-the-art methods can be categorized into 4 groups: (i) under-sampling, (ii) over-sampling, (iii) combination of overand under-sampling, and (iv) ensemble learning m...

متن کامل

Scikit-learn: Machine Learning in Python

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distrib...

متن کامل

TF.Learn: TensorFlow's High-level Module for Distributed Machine Learning

TF.Learn is a high-level Python module for distributed machine learning inside TensorFlow (Abadi et al., 2015). It provides an easy-to-use Scikit-learn (Pedregosa et al., 2011) style interface to simplify the process of creating, configuring, training, evaluating, and experimenting a machine learning model. TF.Learn integrates a wide range of state-ofart machine learning algorithms built on top...

متن کامل

Flexible State-Merging for Learning (P)DFAs in Python

We present a Python package for learning (non-)probabilistic deterministic finite state automata and provide heuristics in the red-blue framework. As our package is built along the API of the popular scikit-learn package, it is easy to use and new learning methods are easy to add. It provides PDFA learning as an additional tool for sequence prediction or classification to data scientists, witho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1309.0238  شماره 

صفحات  -

تاریخ انتشار 2013